Skip to content

perf(docker): cache cargo+ccache and drop redundant maturin step (-51% rebuild on .py change)#1885

Merged
qin-ctx merged 2 commits into
volcengine:mainfrom
t0saki:perf/dockerfile-cache-mounts
May 7, 2026
Merged

perf(docker): cache cargo+ccache and drop redundant maturin step (-51% rebuild on .py change)#1885
qin-ctx merged 2 commits into
volcengine:mainfrom
t0saki:perf/dockerfile-cache-mounts

Conversation

@t0saki
Copy link
Copy Markdown
Collaborator

@t0saki t0saki commented May 7, 2026

Description

Speed up Docker rebuilds when only Python source changes by adding BuildKit cache mounts to the heavy py-builder stage and removing a redundant second maturin build step.

In the current Dockerfile, any change inside openviking/ invalidates step 11 (COPY openviking/ openviking/), which cascades through the two heavy RUN steps:

  • step 15 (uv sync --no-editable) — triggers setup.py to rebuild the ov Rust CLI, ragfs-python (via maturin), and the C++ vector engine via cmake.
  • step 16 — reinstalls maturin and rebuilds ragfs-python a second time, overwriting the .so already installed in the venv.

Neither rerun benefits from any persistent cache, so a one-line .py edit costs ~10 minutes of native compilation that produces output identical to the previous build.

Related Issue

N/A

Type of Change

  • Performance improvement

Changes Made

  • Add BuildKit cache mounts to step 15 for /cargo-target, /usr/local/cargo/registry, /usr/local/cargo/git, and /root/.ccache so cargo and gcc/g++ reuse work across rebuilds.
  • Pin CARGO_TARGET_DIR=/cargo-target so the path stays stable when uv builds wheels in ephemeral isolated tempdirs.
  • Install ccache and prepend /usr/lib/ccache to PATH so cmake's shutil.which("gcc") resolves to the ccache wrapper.
  • Drop step 16 entirely. setup.py's build_ragfs_python_artifact already runs maturin during step 15 (uv invokes setuptools build_meta with bdist_wheel in argv, so _should_require_ragfs_artifact() returns True and the build fails closed if the .so can't be produced). The .so is bundled into the wheel via package_data and installed into /app/.venv on wheel install.

Why step 16 was introduced — and why it became redundant

Step 16 was introduced in #1221 (the agfs → ragfs(Rust) rewrite) and was correct at that time. Two follow-up PRs since then made it obsolete:

Date PR Author Effect on step 16
2026-04-05 #1221 @MaojiaSheng Introduces step 16. At this commit, pyproject.toml's build-system.requires does not list maturin, and setup.py's build_ragfs_python_artifact silently skips when shutil.which(\"maturin\") returns None. So step 15's isolated wheel build env has no maturin → ragfs is skipped → the wheel contains no .so → runtime import would fail. Step 16 is the necessary fallback: explicitly install maturin, build ragfs, extract the .so into the installed venv.
2026-04-09 #1307 @jiahui Zhou Adds \"maturin>=1.0,<2.0\" to pyproject.toml's build-system.requires (commit message: "build: move ragfs-python packaging into setup.py"). From this point on, step 15's isolated build env already has maturin, so build_ragfs_python_artifact runs successfully inside the wheel build. Step 16 becomes a no-op overwrite of the .so that step 15 already installed.
2026-04-15 #1466 @qin Haojie Adds _should_require_ragfs_artifact() so wheel builds fail closed if ragfs_python.abi3.so can't be produced. This upgrades "step 15 produces the .so" from an empirical observation to a code-enforced contract.

So step 16's original safety role has been migrated into setup.py + pyproject.toml since #1307, in a strictly more robust way (the wheel itself fails closed; no separate post-install patching needed). This PR removes the now-redundant fallback.

Testing

End-to-end verified on the deployment build server (ARM Neoverse-N1, 4 vCPU @ 2.0 GHz, QEMU virt, Debian 13 aarch64 host, LXC container with 12 GiB RAM):

build step 15 step 16 total vs baseline
baseline (no perf changes) 510s 113s 12m22s
this PR, cold (cache empty) 354s (deleted) 9m32s -23%
this PR, hot (.py edit, cache hit) 240.8s (deleted) 6m02s -51%

Local sanity check on a developer laptop (Apple M4 Pro, 14-core / 48 GiB host, Linux VM via colima with 14 vCPU / 16 GiB):

build step 15 total
this PR, cold (after docker builder prune -a) 305.6s 6m21s
this PR, hot (.py edit) 87.3s 2m12s

Runtime correctness (built image actually starts and serves):

  • ls /app/.venv/lib/python3.13/site-packages/openviking/lib/ragfs_python.abi3.so (11.8 MB) is present, written by step 15's wheel install (no longer overwritten by the deleted step 16).

  • python -c \"from openviking.lib import ragfs_python; print(ragfs_python.__file__)\" resolves to the venv path with no errors.

  • docker compose up -d + curl /health{\"status\":\"ok\",\"healthy\":true,\"version\":\"0.3.15.devN\",\"auth_mode\":\"api_key\"}.

  • I have added tests that prove my fix is effective or that my feature works

    • N/A — build-only change with no functional behavior added or removed. Validated via the timing tables and the runtime import / health check above.
  • New and existing unit tests pass locally with my changes

    • Pytest is unaffected; the change touches only Dockerfile.
  • I have tested this on the following platforms:

    • Linux (ARM aarch64 build server — full build + runtime verification)
    • macOS (Apple Silicon / colima Linux VM, build only)
    • Windows (not tested; Dockerfile path)

Checklist

  • My code follows the project's coding style
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation (no doc impact)
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Additional Notes

Cache mounts were chosen over restructuring COPY order because:

  • A COPY split + uv sync --reinstall-package openviking does not avoid recompilation: setup.py's build_extension (cmake-driven C++ engine build) does not honor any skip env var, so a Python-only reinstall would still rerun cmake. Adding such a flag would expand the diff into setup.py and increase review surface.
  • Cache mounts are reproducibility-neutral (they affect speed only, not the wheel/image bits), and a fresh BuildKit cache produces a build identical to the previous behavior.

The largest remaining cost inside step 15 is uv wheel packaging + cmake configure (~3–4 min on the build server, ~80 s locally) that cache mounts cannot address. Reaching ~1 minute hot rebuilds would require switching to an editable install or splitting native vs Python source into separate COPY layers, which is out of scope for this PR.

t0saki added 2 commits May 7, 2026 13:15
The two heavy RUN steps in py-builder (uv sync + maturin build) re-execute
on every Python source change because the upstream COPY layer for openviking/
invalidates the cache. Each rerun was ~510s + ~115s ≈ 10 min of wasted work
even though Rust/C++ source was unchanged.

Add BuildKit cache mounts so cargo and the C++ engine compilation can skip
work whose inputs are unchanged:

- Mount /cargo-target, cargo registry, and cargo git so cargo's incremental
  build artifacts persist across layer reruns. Pin CARGO_TARGET_DIR so the
  path stays stable when uv builds wheels in ephemeral isolated tempdirs.
- Install ccache and prepend /usr/lib/ccache to PATH so cmake (which calls
  shutil.which("gcc")) resolves the ccache wrapper. ccache is path-agnostic,
  so it benefits the cmake_build subdir even though setup.py recreates it
  in a fresh tempdir each wheel build.
- Mount /root/.ccache so the ccache hash store persists across reruns.

Expected: hot rebuilds on Python-only changes drop step 15 from ~510s to
~60-120s (uv wheel packaging overhead remains; cargo + g++ skip on cache hit).
The second RUN step in py-builder built ragfs-python a second time and
extracted its .so into the installed openviking package. This was
redundant: setup.py's build_ragfs_python_artifact() already runs maturin
during step 15 (uv sync --no-editable), and because build_meta passes
'bdist_wheel' through PEP 517, _should_require_ragfs_artifact() returns
True and the build fails closed if maturin can't produce ragfs_python.so.
The .so is then bundled into the wheel via package_data and installed
into /app/.venv on wheel install. The second step's only effect was to
overwrite the same file, costing ~115s per build.

Verified after the fact by inspecting the installed venv and importing
ragfs_python in the runtime container.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

PR Reviewer Guide 🔍

(Review updated until commit 9523b9e)

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis ❌

1466 - Not compliant

Non-compliant requirements:

  • Fail release wheel builds when ragfs_python cannot be produced
  • Add wheel smoke tests
  • Honor CI-selected CC and CXX in setup.py
  • Switch Linux release builds to clang

1221 - Not compliant

Non-compliant requirements:

Rewrite agfs to ragfs with rust

1307 - Not compliant

Non-compliant requirements:

Fix ci

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🏅 Score: 90
🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ No major issues detected

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

PR Code Suggestions ✨

No code suggestions found for the PR.

@t0saki t0saki marked this pull request as draft May 7, 2026 06:06
@t0saki t0saki marked this pull request as ready for review May 7, 2026 06:16
Copilot AI review requested due to automatic review settings May 7, 2026 06:16
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

Persistent review updated to latest commit 9523b9e

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

PR Code Suggestions ✨

No code suggestions found for the PR.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes Docker image rebuild performance by adding BuildKit cache mounts for Rust/C++ compilation artifacts in the py-builder stage and removing a now-redundant post-install maturin build step, relying instead on the existing wheel-build contract in setup.py/pyproject.toml.

Changes:

  • Add BuildKit cache mounts for Cargo target/registry/git and ccache, and pin CARGO_TARGET_DIR to a stable path to maximize cache hits across rebuilds.
  • Install and enable ccache via PATH so CMake-invoked gcc/g++ calls use the ccache wrappers.
  • Delete the second maturin build step (previously extracting the .so into the venv), since wheel builds already require and bundle the ragfs_python artifact.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@qin-ctx qin-ctx merged commit 65ffa78 into volcengine:main May 7, 2026
10 of 11 checks passed
@github-project-automation github-project-automation Bot moved this from Backlog to Done in OpenViking project May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants